18 Commits

Author SHA1 Message Date
Frej Drejhammar
6700b164d0 Merge branch 'PR/293'
Closes #292
2022-10-23 14:47:04 +02:00
chrisjbillington
13c273f10c Resolve unicode escape sequences not being processed correctly
In `process_unicode_escape_sequences()`, any backslash escape sequences
in the original string are escaped upon the first
`.encode('unicode-escape')` and therefore round-trip the sequence of
`.encode('unicode-escape').decode('unicode-escape')`.

That is not what we want - we want these sequences to be passed-through
the `.encode` unchanged, so that they will be converted to the
character they represent upon `.decode()`.

This patch changes the `.encode()` step to pass through any ascii
characters unchanged, only escaping non-ascii characters. This ensures
any existing backslash escape sequences will be interpreted as the
character they represent upon `.decode()`.
2022-10-23 11:51:33 +11:00
Frej Drejhammar
667404e836 Merge branch 'PR291' 2022-09-21 18:31:16 +02:00
Nicolas Vanhoren
38e236962d Update README.md to change recommandation for crlf filtering 2022-09-21 01:37:39 +02:00
Frej Drejhammar
dbb8158527 Merge branch 'frej/submodule-doc-improvement' 2022-02-10 20:05:07 +01:00
Frej Drejhammar
bb0bcda7ba Merge branch 'frej/fix-re-future-warning' 2022-02-10 20:04:14 +01:00
Frej Drejhammar
838b654614 Remove inconsistencies from submodule documentation
The submodule documentation is not consistent with regards to the
example directory structure. Update the example to be consistent.

Closes #277.
2022-02-09 15:58:48 +01:00
Frej Drejhammar
f179afce65 Fix FutureWarning about nested sets in re
Since Python 3.7 the re module warns for syntax which could, in the
future, be misparsed as a nested set. Avoid this by escaping the
literal `[` we search for in the regexp.

Reported by Monte Davidoff @mndavidoff

Closes #269.
2022-02-09 15:37:29 +01:00
Frej Drejhammar
5b7ca5aaec Give proper error message when refusing to overwrite existing branch
If fast-export was asked to export a Mercurial branch to Git and a
branch of the same name already existed in the Git repo but it was not
created by fast export, fast-export would crash while trying to format
an error message claiming that the destination branch was modified
behind its back.

This patch extends fast-export to detect the situation above and give
a proper error message which hopefully is less confusing to the user.

Credits for discovering the original crash goes to Shun-ichi Goto
<gotoh@taiyo.co.jp>.

Closes: #269.
2021-08-27 16:04:40 +02:00
Frej Drejhammar
4227621eed Update contribution guidelines and make github display them
Try to make it clear that sloppy, throw it over the fence, patches
won't be accepted without revision and try to make sure a potential
contributor sees the warning while creating a pull request.
2021-07-29 15:28:01 +02:00
Frej Drejhammar
bdfc0c08c7 Merge branch 'frej/issue-258'
Closes 258
2021-02-26 16:44:31 +01:00
Frej Drejhammar
001749e69d Merge branch 'PR/260'
Closes 257
2021-02-26 16:40:12 +01:00
SirIntellegence
20c22a3110 Add plugin support for the 'extra' field
Permits plugins to import other information such as svn conversion revisions
2021-02-22 13:09:48 -07:00
Frej Drejhammar
f741bf39f2 bugfix: Avoid starting incremental conversions from scratch
Keys and values in the state cache are byte strings, therefore a
lookup of 'tip' will always fail. The failure makes the conversion
start over from the beginning, but as fast-export is deterministic the
results are the same, just very inefficient. The bug has existed since
the port to Python 3.

This patch switches the 'tip' lookup to use a byte string which should
make incremental conversions restart at the last converted commit. As
'x' == b'x' in Python 2, this should be a backwards compatible change.

Bug reported and fix suggested by Tomas Kolda.

Fixes #258.
2021-02-19 16:47:53 +01:00
Frej Drejhammar
427663c766 Merge branch 'PR/254' 2021-01-10 15:18:28 +01:00
Ray Luo
056756f193 Remove some ".py" wording
Avoid confusion about which file is the main entry point to fast-export,
in order to avoid the issue mentioned here

https://github.com/frej/fast-export/issues/158#issuecomment-754482516

Also fix a typo
2021-01-09 02:06:52 -08:00
Frej Drejhammar
588e03bb23 Merge branch 'PR/251' 2020-11-15 15:34:27 +01:00
Jason Winnebeck
89da4ad8af Document --ignore-unnamed-heads option 2020-11-14 21:24:54 -05:00
4 changed files with 92 additions and 35 deletions

28
.github/contributing.md vendored Normal file
View File

@@ -0,0 +1,28 @@
When submitting a patch make sure the commits in your pull request:
* Have good commit messages
Please read Chris Beams' blog post [How to Write a Git Commit
Message](https://chris.beams.io/posts/git-commit/) on how to write a
good commit message. Although the article recommends at most 50
characters for the subject, up to 72 characters are frequently
accepted for fast-export.
* Adhere to good [commit
hygiene](http://www.ericbmerritt.com/2011/09/21/commit-hygiene-and-git.html)
When developing a pull request for hg-fast-export, base your work on
the current `master` branch and rebase your work if it no longer can
be merged into the current `master` without conflicts. Never merge
`master` into your development branch, rebase if your work needs
updates from `master`.
When a pull request is modified due to review feedback, please
incorporate the changes into the proper commit. A good reference on
how to modify history is in the [Pro Git book, Section
7.6](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History).
Please do not submit a pull request if you are not willing to spend
the time required to address review comments or revise the patch until
it follows the guidelines above. A _take it or leave it_ approach to
contributing wastes both your and the maintainer's time.

View File

@@ -27,10 +27,10 @@ command line option.
## Example
Example mercurial repo folder structure (~/mercurial):
Example mercurial repo folder structure (~/mercurial) containing two subrepos:
src/...
subrepo/subrepo1
subrepo/subrepo2
subrepos/subrepo1
subrepos/subrepo2
### Setup
Create an empty new folder where all the converted git modules will be imported:
@@ -41,18 +41,18 @@ Create an empty new folder where all the converted git modules will be imported:
mkdir submodule1
cd submodule1
git init
hg-fast-export.sh -r ~/mercurial/subrepo1
hg-fast-export.sh -r ~/mercurial/subrepos/subrepo1
cd ..
mkdir submodule2
cd submodule2
git init
hg-fast-export.sh -r ~/mercurial/subrepo2
hg-fast-export.sh -r ~/mercurial/subrepos/subrepo2
### Create mapping file
cd ~/imported-gits
cat > submodule-mappings << EOF
"subrepo/subrepo1"="../submodule1"
"subrepo/subrepo2"="../submodule2"
"subrepos/subrepo1"="../submodule1"
"subrepos/subrepo2"="../submodule2"
EOF
### Convert main repository
@@ -60,16 +60,16 @@ Create an empty new folder where all the converted git modules will be imported:
mkdir git-main-repo
cd git-main-repo
git init
hg-fast-export.sh -r ~/mercurial --subrepo-map=../submodule-mappings
hg-fast-export.sh -r ~/mercurial --subrepo-map=~/imported-gits/submodule-mappings
### Result
The resulting repository will now contain the subrepo/subrepo1 and
subrepo/subrepo1 submodules. The created .gitmodules file will look
like:
The resulting repository will now contain the submodules at the paths
`subrepos/subrepo1` and `subrepos/subrepo2`. The created .gitmodules
file will look like:
[submodule "subrepo/subrepo1"]
path = subrepo/subrepo1
[submodule "subrepos/subrepo1"]
path = subrepos/subrepo1
url = ../submodule1
[submodule "subrepo/subrepo2"]
path = subrepo/subrepo2
[submodule "subrepos/subrepo2"]
path = subrepos/subrepo2
url = ../submodule2

View File

@@ -1,4 +1,4 @@
hg-fast-export.(sh|py) - mercurial to git converter using git-fast-import
hg-fast-export.sh - mercurial to git converter using git-fast-import
=========================================================================
Legal
@@ -133,7 +133,10 @@ is to convert line endings in text files from CRLF to git's preferred LF:
# $2 = Mercurial's hash of the file
# $3 = "1" if Mercurial reports the file as binary, otherwise "0"
if [ "$3" == "1" ]; then cat; else dos2unix; fi
if [ "$3" == "1" ]; then cat; else dos2unix -q; fi
# -q option in call to dos2unix allows to avoid returning an
# error code when handling non-ascii based text files (like UTF-16
# encoded text files)
-- End of crlf-filter.sh --
```
@@ -167,7 +170,7 @@ defined filter methods in the [dos2unix](./plugins/dos2unix) and
[branch_name_in_commit](./plugins/branch_name_in_commit) plugins.
```
commit_data = {'branch': branch, 'parents': parents, 'author': author, 'desc': desc, 'revision': revision, 'hg_hash': hg_hash, 'committer': 'committer'}
commit_data = {'branch': branch, 'parents': parents, 'author': author, 'desc': desc, 'revision': revision, 'hg_hash': hg_hash, 'committer': 'committer', 'extra': extra}
def commit_message_filter(self,commit_data):
```
@@ -198,11 +201,15 @@ Notes/Limitations
hg-fast-export supports multiple branches but only named branches with
exactly one head each. Otherwise commits to the tip of these heads
within the branch will get flattened into merge commits. Chris J
Billington's [hg-export-tool] can help you to handle branches with
duplicate heads.
Alternatively, you can use the [head2branch plugin](./plugins/head2branch)
to create a new named branch from an unnamed head.
within the branch will get flattened into merge commits. There are a
few options to deal with this:
1. Chris J Billington's [hg-export-tool] can help you to handle branches with
duplicate heads.
2. Use the [head2branch plugin](./plugins/head2branch) to create a new named
branch from an unnamed head.
3. You can ignore unnamed heads with the `-ignore-unnamed-heads` option, which
is appropriate in situations such as the extra heads being close commits
(abandoned, unmerged changes).
hg-fast-export will ignore any files or directories tracked by mercurial
called `.git`, and will print a warning if it encounters one. Git cannot
@@ -221,8 +228,8 @@ possible to use hg-fast-export on remote repositories
Design
------
hg-fast-export.py was designed in a way that doesn't require a 2-pass
mechanism or any prior repository analysis: if just feeds what it
hg-fast-export was designed in a way that doesn't require a 2-pass
mechanism or any prior repository analysis: it just feeds what it
finds into git-fast-import. This also implies that it heavily relies
on strictly linear ordering of changesets from hg, i.e. its
append-only storage model so that changesets hg-fast-export already
@@ -258,6 +265,10 @@ hygiene](http://www.ericbmerritt.com/2011/09/21/commit-hygiene-and-git.html)
how to modify history is in the [Pro Git book, Section
7.6](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History).
Please do not submit a pull request if you are not willing to spend
the time required to address review comments or revise the patch until
it follows the guidelines above. A _take it or leave it_ approach to
contributing wastes both your and the maintainer's time.
Frequent Problems
=================
@@ -301,4 +312,11 @@ Frequent Problems
git it looks like you have deleted all files, when in fact they have
never been checked out. Just do a checkout of the branch you want.
* `Error: repository has at least one unnamed head: hg r<N>`
By design, hg-fast-export cannot deal with extra heads on a branch.
There are a few options depending on whether the extra heads are
in-use/open or normally closed. See [Notes/Limitations](#noteslimitations)
section for more details.
[hg-export-tool]: https://github.com/chrisjbillington/hg-export-tool

View File

@@ -266,7 +266,7 @@ def sanitize_name(name,what="branch", mapping={}):
if not auto_sanitize:
return mapping.get(name,name)
n=mapping.get(name,name)
p=re.compile(b'([[ ~^:?\\\\*]|\.\.)')
p=re.compile(b'([\\[ ~^:?\\\\*]|\.\.)')
n=p.sub(b'_', n)
if n[-1:] in (b'/', b'.'): n=n[:-1]+b'_'
n=b'/'.join([dot(s) for s in n.split(b'/')])
@@ -294,7 +294,7 @@ def export_commit(ui,repo,revision,old_marks,max,count,authors,
brmap[name]=n
return n
(revnode,_,user,(time,timezone),files,desc,branch,_)=get_changeset(ui,repo,revision,authors,encoding)
(revnode,_,user,(time,timezone),files,desc,branch,extra)=get_changeset(ui,repo,revision,authors,encoding)
if repo[revnode].hidden():
return count
@@ -308,7 +308,7 @@ def export_commit(ui,repo,revision,old_marks,max,count,authors,
commit_data = {'branch': branch, 'parents': parents,
'author': author, 'desc': desc,
'revision': revision, 'hg_hash': hg_hash,
'committer': user}
'committer': user, 'extra': extra}
for filter in plugins['commit_message_filters']:
filter(commit_data)
branch = commit_data['branch']
@@ -434,9 +434,15 @@ def load_mapping(name, filename, mapping_is_raw):
def process_unicode_escape_sequences(s):
# Replace unicode escape sequences in the otherwise UTF8-encoded bytestring s with
# the UTF8-encoded characters they represent. We need to do an additional
# .decode('utf8').encode('unicode-escape') to convert any non-ascii characters into
# their escape sequences so that the subsequent .decode('unicode-escape') succeeds:
return s.decode('utf8').encode('unicode-escape').decode('unicode-escape').encode('utf8')
# .decode('utf8').encode('ascii', 'backslashreplace') to convert any non-ascii
# characters into their escape sequences so that the subsequent
# .decode('unicode-escape') succeeds:
return (
s.decode('utf8')
.encode('ascii', 'backslashreplace')
.decode('unicode-escape')
.encode('utf8')
)
def parse_quoted_line(line):
m=quoted_regexp.match(line)
@@ -493,7 +499,12 @@ def verify_heads(ui,repo,cache,force,ignore_unnamed_heads,branchesmap):
sanitized_name=sanitize_name(b,"branch",branchesmap)
sha1=get_git_sha1(sanitized_name)
c=cache.get(sanitized_name)
if sha1!=c:
if not c and sha1:
stderr_buffer.write(
b'Error: Branch [%s] already exists and was not created by hg-fast-export, '
b'export would overwrite unrelated branch\n' % b)
if not force: return False
elif sha1!=c:
stderr_buffer.write(
b'Error: Branch [%s] modified outside hg-fast-export:'
b'\n%s (repo) != %s (cache)\n' % (b, b'<None>' if sha1 is None else sha1, c)
@@ -547,7 +558,7 @@ def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
except AttributeError:
tip=len(repo)
min=int(state_cache.get('tip',0))
min=int(state_cache.get(b'tip',0))
max=_max
if _max<0 or max>tip:
max=tip
@@ -580,8 +591,8 @@ def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
for rev in range(min,max):
c=export_note(ui,repo,rev,c,authors, encoding, rev == min and min != 0)
state_cache['tip']=max
state_cache['repo']=repourl
state_cache[b'tip']=max
state_cache[b'repo']=repourl
save_cache(tipfile,state_cache)
save_cache(mappingfile,mapping_cache)